Skip to content

perf(benchmark): add executor-aware metrics, automated benchmarking pipeline, and performance report#1533

Open
neetance wants to merge 2 commits intohyperledger-labs:mainfrom
neetance:perf/benchmark-executor-metrics-and-report
Open

perf(benchmark): add executor-aware metrics, automated benchmarking pipeline, and performance report#1533
neetance wants to merge 2 commits intohyperledger-labs:mainfrom
neetance:perf/benchmark-executor-metrics-and-report

Conversation

@neetance
Copy link
Copy Markdown
Contributor

Summary

Extends the benchmarking framework to support executor-aware analysis and improves visibility into system behavior under different execution strategies.

Changes

run_benchmarks.py:

  • New --executor flag (serial|unbounded|pool|all, default all) loops over
    all three executor strategies for every parallel benchmark
  • New --proof_type flag (bf|csp|all, default bf) loops over proof systems
  • New --duration and --cpus flags for easy CLI control
  • Column naming: TestParallelBenchmarkSender[pool]/8 tps encodes executor
    so all strategies coexist in one CSV row
  • Goroutine count parsed from 'Goroutines Created' field in runner output
    and stored as TestParallelBenchmarkSender[pool]/8 goroutines

plot_benchmark_results.py:

  • Left plot: TPS vs workers with one coloured line per executor strategy
  • Right plot: TPS vs mean latency (error bars = std, X = p95) with worker
    count annotations, coloured by executor strategy
  • Backward compatible: skips gracefully if executor columns are absent

runner.go:

  • GoRoutinesCreated field added to Result, captured as net delta of runtime.NumGoroutine() across the recording window
  • Printed in System Health section as Goroutines Created

Benchmark Results

I ran the benchmark across the 3 different strategies with 10 workers to show the number of goroutines created and here are the results:

  • Serial

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="serial"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10395     (Robust Sample)
Duration         30.024s   (Good Duration)
Real Throughput  346.22/s  Observed Ops/sec (Wall Clock)
Pure Throughput  346.48/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           19.877017ms  
 P50 (Median)  28.467144ms  
 Average       28.861544ms  
 P95           34.022228ms  
 P99           39.539328ms  
 P99.9         48.759763ms  
 Max           54.36025ms   (Stable Tail)

Stability Metrics:
 Std Dev  3.181664ms  
 IQR      3.871098ms  Interquartile Range
 Jitter   2.765293ms  Avg delta per worker
 CV       11.02%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              711008 B/op     Allocated bytes per operation
 Allocs              7755 allocs/op  Allocations per operation
 Alloc Rate          233.02 MB/s     Memory pressure on system
 GC Overhead         4.50%           (High GC Pressure)
 GC Pause            1.350151185s    Total Stop-The-World time
 GC Cycles           3414            Full garbage collection cycles
 Goroutines Created  49              Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 19.877017ms-20.902475ms  1      (0.0%)
 20.902475ms-21.980837ms  4      (0.0%)
 21.980837ms-23.114832ms  50     (0.5%)
 23.114832ms-24.307329ms  300   █████ (2.9%)
 24.307329ms-25.561348ms  920   ██████████████████ (8.9%)
 25.561348ms-26.880062ms  1596  ███████████████████████████████ (15.4%)
 26.880062ms-28.266809ms  2028  ████████████████████████████████████████ (19.5%)
 28.266809ms-29.725098ms  1919  █████████████████████████████████████ (18.5%)
 29.725098ms-31.258621ms  1629  ████████████████████████████████ (15.7%)
 31.258621ms-32.871258ms  1035  ████████████████████ (10.0%)
 32.871258ms-34.567091ms  503   █████████ (4.8%)
 34.567091ms-36.350413ms  190   ███ (1.8%)
 36.350413ms-38.225736ms  92    █ (0.9%)
 38.225736ms-40.197808ms  37     (0.4%)
 40.197808ms-42.271619ms  38     (0.4%)
 42.271619ms-44.452418ms  19     (0.2%)
 44.452418ms-46.745726ms  14     (0.1%)
 46.745726ms-49.157345ms  10     (0.1%)
 49.157345ms-51.69338ms   5      (0.0%)
 51.69338ms-54.36025ms    5      (0.0%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7755/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆] (Max: 370 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (55.99s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (55.97s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      56.018s

  • Pool

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="pool"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        11106     (Robust Sample)
Duration         30.018s   (Good Duration)
Real Throughput  369.98/s  Observed Ops/sec (Wall Clock)
Pure Throughput  370.31/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           17.50233ms   
 P50 (Median)  26.55083ms   
 Average       27.00462ms   
 P95           32.95457ms   
 P99           37.392969ms  
 P99.9         52.041343ms  
 Max           71.166264ms  (Stable Tail)

Stability Metrics:
 Std Dev  3.583152ms  
 IQR      4.125824ms  Interquartile Range
 Jitter   3.26569ms   Avg delta per worker
 CV       13.27%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              711489 B/op     Allocated bytes per operation
 Allocs              7766 allocs/op  Allocations per operation
 Alloc Rate          249.11 MB/s     Memory pressure on system
 GC Overhead         5.09%           (Severe GC Thrashing)
 GC Pause            1.528197863s    Total Stop-The-World time
 GC Cycles           3396            Full garbage collection cycles
 Goroutines Created  201             Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 17.50233ms-18.773912ms   11     (0.1%)
 18.773912ms-20.137877ms  49     (0.4%)
 20.137877ms-21.600938ms  235   ███ (2.1%)
 21.600938ms-23.170293ms  847   ████████████ (7.6%)
 23.170293ms-24.853665ms  1907  ████████████████████████████ (17.2%)
 24.853665ms-26.659337ms  2659  ████████████████████████████████████████ (23.9%)
 26.659337ms-28.596196ms  2418  ████████████████████████████████████ (21.8%)
 28.596196ms-30.673772ms  1557  ███████████████████████ (14.0%)
 30.673772ms-32.902288ms  855   ████████████ (7.7%)
 32.902288ms-35.29271ms   355   █████ (3.2%)
 35.29271ms-37.856802ms   117   █ (1.1%)
 37.856802ms-40.607181ms  38     (0.3%)
 40.607181ms-43.557381ms  15     (0.1%)
 43.557381ms-46.721919ms  13     (0.1%)
 46.721919ms-50.116368ms  13     (0.1%)
 50.116368ms-53.757431ms  9      (0.1%)
 53.757431ms-57.663025ms  5      (0.0%)
 57.663025ms-61.852368ms  1      (0.0%)
 66.346077ms-71.166264ms  2      (0.0%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7766/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█▇▇▇▇▆▇▇] (Max: 388 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.47s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.45s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.492s

  • Unbounded

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="unbounded"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10595     (Robust Sample)
Duration         30.022s   (Good Duration)
Real Throughput  352.91/s  Observed Ops/sec (Wall Clock)
Pure Throughput  353.17/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           18.687629ms  
 P50 (Median)  27.975221ms  
 Average       28.315086ms  
 P95           34.032515ms  
 P99           38.579637ms  
 P99.9         60.769941ms  
 Max           70.516977ms  (Stable Tail)

Stability Metrics:
 Std Dev  3.66511ms   
 IQR      4.111613ms  Interquartile Range
 Jitter   3.322761ms  Avg delta per worker
 CV       12.94%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              713795 B/op     Allocated bytes per operation
 Allocs              7755 allocs/op  Allocations per operation
 Alloc Rate          237.49 MB/s     Memory pressure on system
 GC Overhead         5.29%           (Severe GC Thrashing)
 GC Pause            1.589542611s    Total Stop-The-World time
 GC Cycles           3497            Full garbage collection cycles
 Goroutines Created  0               Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 18.687629ms-19.970602ms  15     (0.1%)
 19.970602ms-21.341657ms  79    █ (0.7%)
 21.341657ms-22.80684ms   223   ███ (2.1%)
 22.80684ms-24.372613ms   704   ███████████ (6.6%)
 24.372613ms-26.045882ms  1653  ███████████████████████████ (15.6%)
 26.045882ms-27.834027ms  2434  ████████████████████████████████████████ (23.0%)
 27.834027ms-29.744934ms  2431  ███████████████████████████████████████ (22.9%)
 29.744934ms-31.787033ms  1688  ███████████████████████████ (15.9%)
 31.787033ms-33.969329ms  827   █████████████ (7.8%)
 33.969329ms-36.301447ms  363   █████ (3.4%)
 36.301447ms-38.793674ms  79    █ (0.7%)
 38.793674ms-41.457002ms  34     (0.3%)
 41.457002ms-44.303176ms  15     (0.1%)
 44.303176ms-47.344751ms  13     (0.1%)
 47.344751ms-50.595141ms  12     (0.1%)
 50.595141ms-54.068682ms  5      (0.0%)
 54.068682ms-57.780695ms  4      (0.0%)
 57.780695ms-61.747551ms  7      (0.1%)
 61.747551ms-65.986745ms  2      (0.0%)
 65.986745ms-70.516977ms  7      (0.1%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7755/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇▇█▇▇▇▇▇▇▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇] (Max: 374 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.57s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.55s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.594s

Comparison report

I have attached the pdf of the comparison report of the results of different benchmarks with different configurations
benchmark_results.pdf

Let me know if this is good 🙏

…d comparison plots

Signed-off-by: Ankit Basu <ankitbasu14@gmail.com>
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

Hi @neetance , great effort. I really appreciate.
Please, open a Github Issue about this feature.

Thanks a lot 🙏

@adecaro adecaro self-requested a review April 13, 2026 05:49
@adecaro adecaro self-assigned this Apr 13, 2026
@adecaro adecaro added this to the Q2/26 milestone Apr 13, 2026
@adecaro adecaro force-pushed the perf/benchmark-executor-metrics-and-report branch from a4921ee to 72b7588 Compare April 13, 2026 05:49
@neetance
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @adecaro 🙏
I have opened issue #1542 related to this

@adecaro adecaro force-pushed the perf/benchmark-executor-metrics-and-report branch from 72b7588 to d54ddfa Compare April 13, 2026 12:09
@adecaro adecaro force-pushed the perf/benchmark-executor-metrics-and-report branch from d54ddfa to dec9b70 Compare April 13, 2026 17:42
@adecaro adecaro requested review from AkramBitar and Effi-S April 13, 2026 17:43
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

@AkramBitar @Effi-S , please, review this. Thanks 🙏

@Effi-S
Copy link
Copy Markdown

Effi-S commented Apr 15, 2026

Hi @neetance ,
Looks good :)
The report comes out nice.
You can tell you took great care in cleaning up the code here.
I only had some Nit Picking comments

#!/usr/bin/env python3
"""
plot_benchmark_results.py: Generates comparison plots across executor
strategies (serial, unbounded, pool) for each parallel benchmark test.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @neetance,
Nice work :)
Can we add a brief explanation what each strategy means?

def _next_run():
n = run_counter[0]
run_counter[0] += 1
return n
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way to remove the global I that was used.
If you want to make this even cleaner, you can try:

from itertools import count
RUN_COUNTER = counter(0)
...
n = next(RUN_COUNTER)

Comment thread cmd/benchmarking/run_benchmarks.py Outdated

# --- Unit conversion ---
timestamp_str = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
output_folder_name = f"benchmark_logs_{timestamp_str}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shorten this to:

output_folder_name = f"benchmark_logs_{datetime.now():%F_%H-%M-%S}"

validator_benchmarks_folder = os.path.join(TOKENSDK_ROOT, "token/core/zkatdlog/nogh/v1/validator")
issuer_benchmarks_folder = os.path.join(TOKENSDK_ROOT, "token/core/zkatdlog/nogh/v1/issue")
validator_benchmarks_folder= os.path.join(TOKENSDK_ROOT, "token/core/zkatdlog/nogh/v1/validator")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathlib can be used instead of os.path here

@Effi-S Effi-S self-requested a review April 15, 2026 10:29
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 15, 2026

Hi @neetance , in the Unbounded case why do we have

Goroutines Created  0               Net goroutines above baseline during recording

I would expect much more than the bounded, no?

…mprove benchmarking scripts

Signed-off-by: Ankit Basu <ankitbasu14@gmail.com>
@neetance neetance force-pushed the perf/benchmark-executor-metrics-and-report branch from dec9b70 to bdfd597 Compare April 15, 2026 17:26
@neetance
Copy link
Copy Markdown
Contributor Author

neetance commented Apr 15, 2026

Hi @adecaro, @Effi-S thanks for your reviews, I really appreciate the suggestions to improve and clean the code even further 🙏
As for the goroutine count showing 0 during unbounded executor tests, it was occurring because the goroutines spawned for unbounded runs were large in number and short-lived, so they weren't getting captured properly. To be more specific, by the time time.Sleep(cfg.Duration) finished and we sampled again, the unbounded goroutines spawned during proof generation had already finished and exited.
To fix this, I tracked goroutine creation continuously using the timeline monitor goroutine that already runs every second, and tracked the peak observed during recording. The peakGoroutines atomic tracks the highest value seen across all ticks, and the baseline is subtracted to give the net executor created goroutines above the framework baseline.

I ran the benchmark again and this time we see a high number of goroutines:

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="unbounded"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        9871      (Robust Sample)
Duration         30.025s   (Good Duration)
Real Throughput  328.76/s  Observed Ops/sec (Wall Clock)
Pure Throughput  329.08/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           18.917731ms  
 P50 (Median)  29.912486ms  
 Average       30.388176ms  
 P95           37.133239ms  
 P99           44.864432ms  
 P99.9         58.516808ms  
 Max           63.665433ms  (Stable Tail)

Stability Metrics:
 Std Dev  4.180158ms  
 IQR      4.66102ms   Interquartile Range
 Jitter   3.778691ms  Avg delta per worker
 CV       13.76%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              713745 B/op     Allocated bytes per operation
 Allocs              7754 allocs/op  Allocations per operation
 Alloc Rate          221.12 MB/s     Memory pressure on system
 GC Overhead         4.97%           (High GC Pressure)
 GC Pause            1.493270059s    Total Stop-The-World time
 GC Cycles           2995            Full garbage collection cycles
 Goroutines Created  426             Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 18.917731ms-20.101144ms  8      (0.1%)
 20.101144ms-21.358586ms  22     (0.2%)
 21.358586ms-22.694689ms  52    █ (0.5%)
 22.694689ms-24.114373ms  215   ████ (2.2%)
 24.114373ms-25.622866ms  514   █████████ (5.2%)
 25.622866ms-27.225724ms  1109  █████████████████████ (11.2%)
 27.225724ms-28.92885ms   1886  ████████████████████████████████████ (19.1%)
 28.92885ms-30.738516ms   2061  ████████████████████████████████████████ (20.9%)
 30.738516ms-32.661388ms  1732  █████████████████████████████████ (17.5%)
 32.661388ms-34.704546ms  1120  █████████████████████ (11.3%)
 34.704546ms-36.875515ms  619   ████████████ (6.3%)
 36.875515ms-39.182291ms  236   ████ (2.4%)
 39.182291ms-41.63337ms   113   ██ (1.1%)
 41.63337ms-44.237777ms   70    █ (0.7%)
 44.237777ms-47.005105ms  49     (0.5%)
 47.005105ms-49.945546ms  28     (0.3%)
 49.945546ms-53.069928ms  11     (0.1%)
 53.069928ms-56.389758ms  10     (0.1%)
 56.389758ms-59.917262ms  11     (0.1%)
 59.917262ms-63.665433ms  5      (0.1%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7754/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇▇█▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇] (Max: 355 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.93s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.90s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.956s

For serial and pool, the results are normal as expected:

  • Serial:

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="serial"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10140     (Robust Sample)
Duration         30.018s   (Good Duration)
Real Throughput  337.79/s  Observed Ops/sec (Wall Clock)
Pure Throughput  338.03/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           21.982059ms  
 P50 (Median)  29.299523ms  
 Average       29.582798ms  
 P95           34.318476ms  
 P99           37.22537ms   
 P99.9         42.521368ms  
 Max           48.446419ms  (Stable Tail)

Stability Metrics:
 Std Dev  2.678099ms  
 IQR      3.436592ms  Interquartile Range
 Jitter   2.529674ms  Avg delta per worker
 CV       9.05%       (Acceptable 5-10%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              713366 B/op     Allocated bytes per operation
 Allocs              7746 allocs/op  Allocations per operation
 Alloc Rate          227.33 MB/s     Memory pressure on system
 GC Overhead         4.29%           (High GC Pressure)
 GC Pause            1.289175266s    Total Stop-The-World time
 GC Cycles           3373            Full garbage collection cycles
 Goroutines Created  14              Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 21.982059ms-22.867992ms  3      (0.0%)
 22.867992ms-23.78963ms   25     (0.2%)
 23.78963ms-24.748413ms   110   ██ (1.1%)
 24.748413ms-25.745837ms  371   ████████ (3.7%)
 25.745837ms-26.783461ms  828   ██████████████████ (8.2%)
 26.783461ms-27.862902ms  1422  ███████████████████████████████ (14.0%)
 27.862902ms-28.985849ms  1820  ████████████████████████████████████████ (17.9%)
 28.985849ms-30.154053ms  1755  ██████████████████████████████████████ (17.3%)
 30.154053ms-31.369338ms  1528  █████████████████████████████████ (15.1%)
 31.369338ms-32.633602ms  1055  ███████████████████████ (10.4%)
 32.633602ms-33.94882ms   595   █████████████ (5.9%)
 33.94882ms-35.317044ms   337   ███████ (3.3%)
 35.317044ms-36.740411ms  156   ███ (1.5%)
 36.740411ms-38.221144ms  75    █ (0.7%)
 38.221144ms-39.761554ms  26     (0.3%)
 39.761554ms-41.364046ms  16     (0.2%)
 41.364046ms-43.031123ms  12     (0.1%)
 43.031123ms-44.765387ms  4      (0.0%)
 44.765387ms-46.569547ms  1      (0.0%)
 46.569547ms-48.446419ms  1      (0.0%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7746/op). This will trigger frequent GC cycles and increase Max Latency.
[PASS] RunBenchmark looks healthy and statistically sound.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇▇█▇▇▇▇▇▇▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇] (Max: 358 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (56.83s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (56.81s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      56.854s
  • Pool:

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="pool"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10484     (Robust Sample)
Duration         30.021s   (Good Duration)
Real Throughput  349.23/s  Observed Ops/sec (Wall Clock)
Pure Throughput  349.47/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           19.365139ms  
 P50 (Median)  28.225077ms  
 Average       28.614823ms  
 P95           34.912654ms  
 P99           39.915883ms  
 P99.9         47.75586ms   
 Max           51.533005ms  (Stable Tail)

Stability Metrics:
 Std Dev  3.605085ms  
 IQR      4.448442ms  Interquartile Range
 Jitter   3.455528ms  Avg delta per worker
 CV       12.60%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              714843 B/op     Allocated bytes per operation
 Allocs              7780 allocs/op  Allocations per operation
 Alloc Rate          234.94 MB/s     Memory pressure on system
 GC Overhead         4.62%           (High GC Pressure)
 GC Pause            1.38583287s     Total Stop-The-World time
 GC Cycles           3012            Full garbage collection cycles
 Goroutines Created  107             Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 19.365139ms-20.336389ms  12     (0.1%)
 20.336389ms-21.356353ms  41     (0.4%)
 21.356353ms-22.427473ms  143   ███ (1.4%)
 22.427473ms-23.552314ms  349   ███████ (3.3%)
 23.552314ms-24.733571ms  712   ████████████████ (6.8%)
 24.733571ms-25.974073ms  1146  █████████████████████████ (10.9%)
 25.974073ms-27.276793ms  1587  ███████████████████████████████████ (15.1%)
 27.276793ms-28.64485ms   1772  ████████████████████████████████████████ (16.9%)
 28.64485ms-30.081521ms   1605  ████████████████████████████████████ (15.3%)
 30.081521ms-31.590248ms  1248  ████████████████████████████ (11.9%)
 31.590248ms-33.174645ms  842   ███████████████████ (8.0%)
 33.174645ms-34.838506ms  489   ███████████ (4.7%)
 34.838506ms-36.585818ms  261   █████ (2.5%)
 36.585818ms-38.420765ms  136   ███ (1.3%)
 38.420765ms-40.347743ms  55    █ (0.5%)
 40.347743ms-42.371369ms  40     (0.4%)
 42.371369ms-44.496488ms  24     (0.2%)
 44.496488ms-46.728192ms  9      (0.1%)
 46.728192ms-49.071826ms  7      (0.1%)
 49.071826ms-51.533005ms  6      (0.1%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7780/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▇▆▆▇▇▇▇▇▇▇▇] (Max: 375 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.64s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.62s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.668s

I also applied the refactoring changes to make the code more cleaner as suggested by @Effi-S and ran the script again and everything is passing.

Let me know if this looks good 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Executor-aware benchmarking and performance analysis for cryptographic workloads

3 participants